Loading required package: car
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)
Warning: package 'smss' was built under R version 4.2.2
The prediction equation is ŷ = -10536 + 53.8x1 + 2.84x2. Using this, where x2= lot size, the house selling price is expected to increase by 53.8 dollars per each square-foot increase in home size given the lot sized is fixed. This is due to a fixed lot size would make 2.84x2 a set number in the prediction equation. Which means in the prediction equation y = -10356 + 53.8x1 + 2.84x2 x1 would have input values making it increase.
#1(c) For fixed home size, 53.8 * 1 = 2.84x2
Code
x2 <-53.8/2.84x2
[1] 18.94366
An increase in lot size of about 18.94 square-feet would have the same impact as an increase of 1 square-foot in home size on the predicted selling price.
#2
Code
data("salary")salary
degree rank sex year ysdeg salary
1 Masters Prof Male 25 35 36350
2 Masters Prof Male 13 22 35350
3 Masters Prof Male 10 23 28200
4 Masters Prof Female 7 27 26775
5 PhD Prof Male 19 30 33696
6 Masters Prof Male 16 21 28516
7 PhD Prof Female 0 32 24900
8 Masters Prof Male 16 18 31909
9 PhD Prof Male 13 30 31850
10 PhD Prof Male 13 31 32850
11 Masters Prof Male 12 22 27025
12 Masters Assoc Male 15 19 24750
13 Masters Prof Male 9 17 28200
14 PhD Assoc Male 9 27 23712
15 Masters Prof Male 9 24 25748
16 Masters Prof Male 7 15 29342
17 Masters Prof Male 13 20 31114
18 PhD Assoc Male 11 14 24742
19 PhD Assoc Male 10 15 22906
20 PhD Prof Male 6 21 24450
21 PhD Asst Male 16 23 19175
22 PhD Assoc Male 8 31 20525
23 Masters Prof Male 7 13 27959
24 Masters Prof Female 8 24 38045
25 Masters Assoc Male 9 12 24832
26 Masters Prof Male 5 18 25400
27 Masters Assoc Male 11 14 24800
28 Masters Prof Female 5 16 25500
29 PhD Assoc Male 3 7 26182
30 PhD Assoc Male 3 17 23725
31 PhD Asst Female 10 15 21600
32 PhD Assoc Male 11 31 23300
33 PhD Asst Male 9 14 23713
34 PhD Assoc Female 4 33 20690
35 PhD Assoc Female 6 29 22450
36 Masters Assoc Male 1 9 20850
37 Masters Asst Female 8 14 18304
38 Masters Asst Male 4 4 17095
39 Masters Asst Male 4 5 16700
40 Masters Asst Male 4 4 17600
41 Masters Asst Male 3 4 18075
42 PhD Asst Male 3 11 18000
43 Masters Assoc Male 0 7 20999
44 Masters Asst Female 3 3 17250
45 Masters Asst Male 2 3 16500
46 Masters Asst Male 2 1 16094
47 Masters Asst Female 2 6 16150
48 Masters Asst Female 2 2 15350
49 Masters Asst Male 1 1 16244
50 Masters Asst Female 1 1 16686
51 Masters Asst Female 1 1 15000
52 Masters Asst Female 0 2 20300
#2(a)
Code
summary(lm(salary ~ sex, data = salary))
Call:
lm(formula = salary ~ sex, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8602.8 -4296.6 -100.8 3513.1 16687.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24697 938 26.330 <2e-16 ***
sexFemale -3340 1808 -1.847 0.0706 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5782 on 50 degrees of freedom
Multiple R-squared: 0.0639, Adjusted R-squared: 0.04518
F-statistic: 3.413 on 1 and 50 DF, p-value: 0.0706
Here the null hypothesis would be: mean salary for men and women are equal The Alternative hypothesis would be: the salaries are not equal for men and women. Here, the female coefficient is -3340, which can imply that women do make less than men not considering any other variables. However, if we consider the other variables and also there is a significance level of 0.07, hence we fail to reject the null hypothesis. Therefore, we cannot conclude that there is a difference between mean salaries for men and women.
#2(b)
Code
model <-lm(salary ~ ., data = salary)summary(model)
Call:
lm(formula = salary ~ ., data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15746.05 800.18 19.678 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAssoc 5292.36 1145.40 4.621 3.22e-05 ***
rankProf 11118.76 1351.77 8.225 1.62e-10 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
The 95% confidence interval for the difference in salary between male and females is -697.82 and 3030.56.
#2(d)
DegreeePhD: For a faculty member that has a PhD degree their predicted salary is $1388.61 than other faculty members who don’t have a PhD degree.
Rank: The baseline category is asst prof.For an associate professor their predicted salary is $5,292.36. For a professor their predicted salary is $11,118.76. These salary differences are statistically significant at the 0.0001 alpha level for both Asst and Professor rank.
Sex: For a faculty member who is female their predicted salary is $1166.37 more than a male. However, his coefficient is not statistically significant at any alpha level.
Year: Every year a faculty member’s salary is expected to increase by $478.31.The coeffiticent is significant at the 0.0001 alpha level.
ysdegree: For every year after degree completion they can expect to have their slary decrese by $124.57. However this coefficient is not significant at any alpha level.
#2(d)
Code
salary$rank<-relevel(salary$rank, ref ='Assoc')summary(lm(salary ~ degree + rank + sex + year + ysdeg, data=salary))
Call:
lm(formula = salary ~ degree + rank + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-4045.2 -1094.7 -361.5 813.2 9193.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21038.41 1109.12 18.969 < 2e-16 ***
degreePhD 1388.61 1018.75 1.363 0.180
rankAsst -5292.36 1145.40 -4.621 3.22e-05 ***
rankProf 5826.40 1012.93 5.752 7.28e-07 ***
sexFemale 1166.37 925.57 1.260 0.214
year 476.31 94.91 5.018 8.65e-06 ***
ysdeg -124.57 77.49 -1.608 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2398 on 45 degrees of freedom
Multiple R-squared: 0.855, Adjusted R-squared: 0.8357
F-statistic: 44.24 on 6 and 45 DF, p-value: < 2.2e-16
The baseline category is now Assoc. According to these coefficients, faculty of rank asst are expected to make $5292.36 less than Associate professors. Faculty of rank Professor are expected to make $5826.40 more than Associate professors.
#2(e)
Code
summary(lm(salary ~ degree + sex + year + ysdeg, data=salary))
Call:
lm(formula = salary ~ degree + sex + year + ysdeg, data = salary)
Residuals:
Min 1Q Median 3Q Max
-8146.9 -2186.9 -491.5 2279.1 11186.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17183.57 1147.94 14.969 < 2e-16 ***
degreePhD -3299.35 1302.52 -2.533 0.014704 *
sexFemale -1286.54 1313.09 -0.980 0.332209
year 351.97 142.48 2.470 0.017185 *
ysdeg 339.40 80.62 4.210 0.000114 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3744 on 47 degrees of freedom
Multiple R-squared: 0.6312, Adjusted R-squared: 0.5998
F-statistic: 20.11 on 4 and 47 DF, p-value: 1.048e-09
Removing the rank variable reveals a difference between male and female salaries with females making $1286.54 less than men. However, this difference is not signficant at any standard alpha levels.
#2(f)
Code
salary<-mutate(salary, hired=case_when(ysdeg<15~"new", ysdeg>=15~"old"))summary(lm( salary ~ degree + sex + rank + hired + year, data = salary))
Call:
lm(formula = salary ~ degree + sex + rank + hired + year, data = salary)
Residuals:
Min 1Q Median 3Q Max
-3588.0 -1532.2 -232.2 565.7 9132.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20468.7 951.7 21.507 < 2e-16 ***
degreePhD 1073.5 843.3 1.273 0.2096
sexFemale 1046.7 858.0 1.220 0.2289
rankAsst -5012.5 1002.3 -5.001 9.16e-06 ***
rankProf 6213.3 1045.0 5.946 3.76e-07 ***
hiredold -2421.6 1187.9 -2.038 0.0474 *
year 450.7 81.5 5.530 1.55e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2360 on 45 degrees of freedom
Multiple R-squared: 0.8597, Adjusted R-squared: 0.841
F-statistic: 45.95 on 6 and 45 DF, p-value: < 2.2e-16
According to this equation, faculty hired by the old dean make $2421.60 less than new faculty when we control for other factors. This is significant at the 0.05 alpha level. Excluded ysdegree after creating the new variable to avoid multicollinearity.
Call:
lm(formula = Price ~ Size + New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-205102 -34374 -5778 18929 163866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -40230.867 14696.140 -2.738 0.00737 **
Size 116.132 8.795 13.204 < 2e-16 ***
New 57736.283 18653.041 3.095 0.00257 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53880 on 97 degrees of freedom
Multiple R-squared: 0.7226, Adjusted R-squared: 0.7169
F-statistic: 126.3 on 2 and 97 DF, p-value: < 2.2e-16
According to the coefficient for size, the price of a house is expected to increase by $116.132 for each square foot increase in size. The coefficient is significant at the 0.0001 alpha level, meaning there is a strong correlation between size and price when the age status (new/old) is held fixed.
According to the coefficient for new, a new house is expected to cost $57,736.283 more than an old house. This variable is significant at the 0.001 level, meaning that whether a house is old or new has a strong positive impact on price of the house.
#3(b)
Y = -40230.867 + 116.132(X1) + 57736.283 (X2) where X1 represents size and X2 represents new/old.
For a new house: Y = -40230.867 + 116.132(size) + 57736.283
For an old house: Y = -40230.867 + 116.132(size)
#3(c)
Code
size<-3000-40230.867+ (116.132* size) +57736.283
[1] 365901.4
Code
size<-3000-40230.867+ (116.132*size)
[1] 308165.1
#3(d)
Code
summary(lm( Price~ Size*New, data = house.selling.price))
Call:
lm(formula = Price ~ Size * New, data = house.selling.price)
Residuals:
Min 1Q Median 3Q Max
-175748 -28979 -6260 14693 192519
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22227.808 15521.110 -1.432 0.15536
Size 104.438 9.424 11.082 < 2e-16 ***
New -78527.502 51007.642 -1.540 0.12697
Size:New 61.916 21.686 2.855 0.00527 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52000 on 96 degrees of freedom
Multiple R-squared: 0.7443, Adjusted R-squared: 0.7363
F-statistic: 93.15 on 3 and 96 DF, p-value: < 2.2e-16
#3(e) The predicted selling price, based on the new regression that includes interaction between Size and Newness, would look like:
For a new house: Y = -22227.81 + 104.44 * Size - 78527.50 * 1 + 61.92 * Size * 1
As size of home goes up, the difference in predicted selling prices between old and new homes becomes larger. Houses that are larger are much greater in price, especially when comparing new large houses to small new houses.
#3(h)
I prefer the second model with the interaction term which provides a clearer picture of how increased square footage makes a larger difference in bigger sized houses. The model with the interaction term also has a larger adjusted R squared.
Code
#for a 1000 sq foot home:#New:-2227.808+166.354*1000-78527.502
[1] 85598.69
Code
#Not new:-2227.808+104.438*1000
[1] 102210.2
I wouldn’t use this model for small homes: for a home that is 1000 square feet, the predicted price for a new house is greater than for an old house. So I dont think this model would be good at predicting tiny house prices.
Source Code
---title: "Homework 4"author: "Kaushika Potluri"desription: "Homework 4"date: "14/11/2022"format: html: toc: true code-fold: true code-copy: true code-tools: trueeditor: markdown: wrap: sentence---## Loading Libraries:```{r}library(readxl)library(tidyverse)library(ggplot2)library(dplyr)library(stringr)library(alr4)library(smss)```#1(a)Prediction equation: ŷ = −10,536 + 53.8x1 + 2.84x2.```{r}#Plugging in the values of home and lot size into prediction equationpredicted_sellingprice <--10536+53.8*1240+2.84*18000predicted_sellingprice```$107,297 is the predicticted selling price```{r}#Residual = Actual-predictedresidual <-145000-107296residual```Therfore, the Residual is $37,704.#1(b)The prediction equation is ŷ = -10536 + 53.8x1 + 2.84x2. Using this, where x2= lot size, the house selling price is expected to increase by 53.8 dollars per each square-foot increase in home size given the lot sized is fixed. This is due to a fixed lot size would make 2.84x2 a set number in the prediction equation. Which means in the prediction equation y = -10356 + 53.8x1 + 2.84x2 x1 would have input values making it increase.#1(c)For fixed home size, 53.8 * 1 = 2.84x2```{r}x2 <-53.8/2.84x2```An increase in lot size of about 18.94 square-feet would have the same impact as an increase of 1 square-foot in home size on the predicted selling price.#2```{r}data("salary")salary```#2(a)```{r}summary(lm(salary ~ sex, data = salary))```Here the null hypothesis would be: mean salary for men and women are equalThe Alternative hypothesis would be: the salaries are not equal for men and women.Here, the female coefficient is -3340, which can imply that women do make less than men not considering any other variables. However, if we consider the other variables and also there is a significance level of 0.07, hence we fail to reject the null hypothesis.Therefore, we cannot conclude that there is a difference between mean salaries for men and women.#2(b)```{r}model <-lm(salary ~ ., data = salary)summary(model)``````{r}confint(model)```The 95% confidence interval for the difference in salary between male and females is -697.82 and 3030.56.#2(d)DegreeePhD: For a faculty member that has a PhD degree their predicted salary is $1388.61 than other faculty members who don’t have a PhD degree. Rank: The baseline category is asst prof.For an associate professor their predicted salary is $5,292.36. For a professor their predicted salary is $11,118.76.These salary differences are statistically significant at the 0.0001 alpha level for both Asst and Professor rank.Sex: For a faculty member who is female their predicted salary is $1166.37 more than a male. However, his coefficient is not statistically significant at any alpha level.Year: Every year a faculty member’s salary is expected to increase by $478.31.The coeffiticent is significant at the 0.0001 alpha level.ysdegree: For every year after degree completion they can expect to have their slary decrese by $124.57. However this coefficient is not significant at any alpha level.#2(d)```{r}salary$rank<-relevel(salary$rank, ref ='Assoc')summary(lm(salary ~ degree + rank + sex + year + ysdeg, data=salary))```The baseline category is now Assoc. According to these coefficients, faculty of rank asst are expected to make $5292.36 less than Associate professors. Faculty of rank Professor are expected to make $5826.40 more than Associate professors.#2(e)```{r}summary(lm(salary ~ degree + sex + year + ysdeg, data=salary))```Removing the rank variable reveals a difference between male and female salaries with females making $1286.54 less than men. However, this difference is not signficant at any standard alpha levels.#2(f)```{r}salary<-mutate(salary, hired=case_when(ysdeg<15~"new", ysdeg>=15~"old"))summary(lm( salary ~ degree + sex + rank + hired + year, data = salary))```According to this equation, faculty hired by the old dean make $2421.60 less than new faculty when we control for other factors. This is significant at the 0.05 alpha level.Excluded ysdegree after creating the new variable to avoid multicollinearity.#3(a)```{r}data(house.selling.price)house.selling.price``````{r}summary(lm(Price ~ Size + New, data= house.selling.price))```According to the coefficient for size, the price of a house is expected to increase by $116.132 for each square foot increase in size. The coefficient is significant at the 0.0001 alpha level, meaning there is a strong correlation between size and price when the age status (new/old) is held fixed.According to the coefficient for new, a new house is expected to cost $57,736.283 more than an old house. This variable is significant at the 0.001 level, meaning that whether a house is old or new has a strong positive impact on price of the house.#3(b)Y = -40230.867 + 116.132(X1) + 57736.283 (X2) where X1 represents size and X2 represents new/old.For a new house: Y = -40230.867 + 116.132(size) + 57736.283For an old house: Y = -40230.867 + 116.132(size)#3(c)```{r}size<-3000-40230.867+ (116.132* size) +57736.283``````{r}size<-3000-40230.867+ (116.132*size)```#3(d)```{r}summary(lm( Price~ Size*New, data = house.selling.price))```#3(e)The predicted selling price, based on the new regression that includes interaction between Size and Newness, would look like:For a new house: Y = -22227.81 + 104.44 * Size - 78527.50 * 1 + 61.92 * Size * 1Old: Y = -22227.81 + 104.44 * Size#3(f)```{r}#new: -2227.808+166.354*3000-78527.502``````{r}#not new:-2227.808+104.438*3000```New: $418,306.70 Not new: $311,086.20#3(g)```{r}#new: -2227.808+166.354*1500-78527.502``````{r}Size <-1500New_Price =-22227.81+104.44* Size -78527.50*1+61.92* Size *1Old_Price =-22227.81+104.44* SizeNew_Price``````{r}Old_Price```As size of home goes up, the difference in predicted selling prices between old and new homes becomes larger.Houses that are larger are much greater in price, especially when comparing new large houses to small new houses.#3(h)I prefer the second model with the interaction term which provides a clearer picture of how increased square footage makes a larger difference in bigger sized houses. The model with the interaction term also has a larger adjusted R squared.```{r}#for a 1000 sq foot home:#New:-2227.808+166.354*1000-78527.502``````{r}#Not new:-2227.808+104.438*1000```I wouldn't use this model for small homes: for a home that is 1000 square feet, the predicted price for a new house is greater than for an old house. So I dont think this model would be good at predicting tiny house prices.